Goto

Collaborating Authors

 naive monte carlo



Efficient Black-box Assessment of Autonomous Vehicle Safety

arXiv.org Machine Learning

While autonomous vehicle (AV) technology has shown substantial progress, we still lack tools for rigorous and scalable testing. Real-world testing, the $\textit{de-facto}$ evaluation method, is dangerous to the public. Moreover, due to the rare nature of failures, billions of miles of driving are needed to statistically validate performance claims. Thus, the industry has largely turned to simulation to evaluate AV systems. However, having a simulation stack alone is not a solution. A simulation testing framework needs to prioritize which scenarios to run, learn how the chosen scenarios provide coverage of failure modes, and rank failure scenarios in order of importance. We implement a simulation testing framework that evaluates an entire modern AV system as a black box. This framework estimates the probability of accidents under a base distribution governing standard traffic behavior. In order to accelerate rare-event probability evaluation, we efficiently learn to identify and rank failure scenarios via adaptive importance-sampling methods. Using this framework, we conduct the first independent evaluation of a full-stack commercial AV system, Comma AI's OpenPilot.


Scalable End-to-End Autonomous Vehicle Testing via Rare-event Simulation

Neural Information Processing Systems

While recent developments in autonomous vehicle (AV) technology highlight substantial progress, we lack tools for rigorous and scalable testing. Real-world testing, the de facto evaluation environment, places the public in danger, and, due to the rare nature of accidents, will require billions of miles in order to statistically validate performance claims. We implement a simulation framework that can test an entire modern autonomous driving system, including, in particular, systems that employ deep-learning perception and control algorithms. Using adaptive importance-sampling methods to accelerate rare-event probability evaluation, we estimate the probability of an accident under a base distribution governing standard traffic behavior. We demonstrate our framework on a highway scenario, accelerating system evaluation by 2-20 times over naive Monte Carlo sampling methods and 10-300P times (where P is the number of processors) over real-world testing.


Scalable End-to-End Autonomous Vehicle Testing via Rare-event Simulation

Neural Information Processing Systems

While recent developments in autonomous vehicle (AV) technology highlight substantial progress, we lack tools for rigorous and scalable testing. Real-world testing, the de facto evaluation environment, places the public in danger, and, due to the rare nature of accidents, will require billions of miles in order to statistically validate performance claims. We implement a simulation framework that can test an entire modern autonomous driving system, including, in particular, systems that employ deep-learning perception and control algorithms. Using adaptive importance-sampling methods to accelerate rare-event probability evaluation, we estimate the probability of an accident under a base distribution governing standard traffic behavior. We demonstrate our framework on a highway scenario, accelerating system evaluation by 2-20 times over naive Monte Carlo sampling methods and 10-300P times (where P is the number of processors) over real-world testing.


Scalable End-to-End Autonomous Vehicle Testing via Rare-event Simulation

arXiv.org Machine Learning

Recent breakthroughs in deep learning have accelerated the development of autonomous vehicles (AVs); many research prototypes now operate on real roads alongside human drivers. While advances in computer-vision techniques have made human-level performance possible on narrow perception tasks such as object recognition, several fatal accidents involving AVs underscore the importance of testing whether the perception and control pipeline--when considered as a whole system--can safely interact with humans. Unfortunately, testing AVs in real environments, the most straightforward validation framework for system-level input-output behavior, requires prohibitive amounts of time due to the rare nature of serious accidents [49]. Concretely, a recent study [29] argues that AVs need to drive "hundreds of millions of miles and, under some scenarios, hundreds of billions of miles to create enough data to clearly demonstrate their safety." Alteratively, formally verifying an AV algorithm's "correctness" [34, 2, 47, 37] is difficult since all driving policies are subject to crashes caused by other drivers [49]. It is unreasonable to ask that the policy be safe under all scenarios. Unfortunately, ruling out scenarios where the AV should not be blamed is a task subject to logical inconsistency, combinatorial growth in specification complexity, and subjective assignment of fault. Motivated by the challenges underlying real-world testing and formal verification, we consider a probabilistic paradigm--which we call a risk-based framework--where our goal is to evaluate the probability of an accident under a base distribution representing standard traffic behavior.